feat(BA-5373): Add blue-green deployment infrastructure and promote API#10426
Draft
jopemachine wants to merge 12 commits intomainfrom
Draft
feat(BA-5373): Add blue-green deployment infrastructure and promote API#10426jopemachine wants to merge 12 commits intomainfrom
jopemachine wants to merge 12 commits intomainfrom
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds infrastructure support for blue-green deployments by introducing an AWAITING_PROMOTION sub-step handler and wiring a manual “promote deployment” operation end-to-end (service → repository → GraphQL), including atomic route traffic switching and revision swap.
Changes:
- Added
DEPLOYING_AWAITING_PROMOTIONsub-step and a newDeployingAwaitingPromotionHandlerto support the pause-before-promotion phase. - Added
promoteDeploymentGraphQL mutation (DTOs, adapter, action, processor, service) for manual promotion. - Implemented
DeploymentRepository.promote_deployment()and extended strategy mutation plumbing to support “promote” route updates in the DB transaction.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ai/backend/manager/sokovan/deployment/handlers/deploying.py | Adds AWAITING_PROMOTION handler and adjusts DEPLOYING/PROVISIONING behavior. |
| src/ai/backend/manager/sokovan/deployment/handlers/init.py | Exports the new deploying handler. |
| src/ai/backend/manager/sokovan/deployment/coordinator.py | Registers the new DEPLOYING/AWAITING_PROMOTION handler. |
| src/ai/backend/manager/services/deployment/service.py | Adds promote_deployment() service method and route classification logic. |
| src/ai/backend/manager/services/deployment/processors.py | Wires the new promote action into processors/supported actions. |
| src/ai/backend/manager/services/deployment/actions/revision_operations/promote_deployment.py | Introduces the promote action + result types. |
| src/ai/backend/manager/services/deployment/actions/revision_operations/init.py | Exports the promote action types. |
| src/ai/backend/manager/repositories/deployment/repository.py | Adds promote_deployment() and extends apply_strategy_mutations() signature to include promote. |
| src/ai/backend/manager/repositories/deployment/db_source/db_source.py | Executes promote route updates as part of strategy mutation transaction. |
| src/ai/backend/manager/data/deployment/types.py | Adds DEPLOYING_AWAITING_PROMOTION to lifecycle sub-steps list. |
| src/ai/backend/manager/api/gql/schema.py | Registers promote_deployment mutation. |
| src/ai/backend/manager/api/gql/deployment/types/revision.py | Adds GraphQL input/payload types for promotion. |
| src/ai/backend/manager/api/gql/deployment/types/init.py | Exports promotion input/payload GraphQL types. |
| src/ai/backend/manager/api/gql/deployment/resolver/revision.py | Adds the promote_deployment mutation resolver. |
| src/ai/backend/manager/api/gql/deployment/resolver/init.py | Exports the new resolver symbol. |
| src/ai/backend/manager/api/gql/deployment/init.py | Re-exports new GraphQL types and resolver. |
| src/ai/backend/manager/api/adapters/deployment.py | Adds adapter method to trigger the promote action. |
| src/ai/backend/common/dto/manager/v2/deployment/response.py | Adds PromoteDeploymentPayload DTO. |
| src/ai/backend/common/dto/manager/v2/deployment/request.py | Adds PromoteDeploymentInput DTO. |
| docs/manager/graphql-reference/v2-schema.graphql | Documents the new mutation and input/payload types (also includes an unrelated schema change). |
| docs/manager/graphql-reference/supergraph.graphql | Same as above for the supergraph schema reference. |
Comments suppressed due to low confidence (2)
docs/manager/graphql-reference/v2-schema.graphql:2972
- The generated GraphQL reference removed
lastUsedAtfromImageV2MetadataInfo, but the Strawberry schema still defineslast_used_at(seesrc/ai/backend/manager/api/gql/image/types.py:206). This makes the published schema docs inconsistent with the actual API. Please regenerate these schema reference files from the current schema or revert the unrelated removal.
type ImageV2MetadataInfo {
"""Config digest for verification."""
digest: String
"""Image size in bytes."""
sizeBytes: Int!
"""Image creation timestamp."""
createdAt: DateTime
"""Timestamp of the most recent session created with this image."""
lastUsedAt: DateTime
docs/manager/graphql-reference/supergraph.graphql:5323
- Same as
v2-schema.graphql:lastUsedAtwas removed fromImageV2MetadataInfoin the supergraph reference, but the Strawberry schema still exposes it. Regenerate or revert to keep schema references consistent.
type ImageV2MetadataInfo
@join__type(graph: STRAWBERRY)
{
"""Config digest for verification."""
digest: String
"""Image size in bytes."""
sizeBytes: Int!
"""Image creation timestamp."""
createdAt: DateTime
"""Timestamp of the most recent session created with this image."""
lastUsedAt: DateTime
"""Parsed tag components."""
tags: [ImageV2TagEntry!]!
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e3eca7a to
97874ea
Compare
00d7731 to
3d57556
Compare
- Add DeployingAwaitingPromotionHandler for AWAITING_PROMOTION sub-step - Add promoteDeployment GraphQL mutation for manual blue-green promotion - Add promote_deployment repository method with atomic route switch - Wire promote through full stack: DTO, Action, Service, Processor, Adapter, GQL - Add promote_route_ids to RouteChanges for blue-green traffic switch - Add DEPLOYING_AWAITING_PROMOTION to DeploymentLifecycleSubStep Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: octodog <mu001@lablup.com>
…ns call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mote resolver Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… promote API Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… health check comparison RouteStatus is a lifecycle enum (PROVISIONING, RUNNING, etc.) and does not have a HEALTHY member. The health check status uses a separate RouteHealthStatus enum with the HEALTHY attribute. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
8a02693 to
5ba1db6
Compare
…ontext Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Guard tz-naive phase_started_at when computing auto-promote delay - Reject manual promotion when no healthy green routes exist Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reject when deployment is not in AWAITING_PROMOTION - Reject when deploying_revision_id is missing - Reject when no healthy green routes exist - Classify HEALTHY green routes as promote and active blue routes as drain Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6e8224f to
2e2c2b7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Resolves BA-5373.
Summary
DeployingAwaitingPromotionHandlerfor blue-green AWAITING_PROMOTION sub-step processingpromoteDeploymentGraphQL mutation for manual blue-green promotionpromote_deploymentrepository method with atomic route switch (promote green → ACTIVE, drain blue → TERMINATING, swap revision)promote_route_idstoRouteChangesfor blue-green traffic switchDEPLOYING_AWAITING_PROMOTIONtoDeploymentLifecycleSubStepContext
This PR provides the infrastructure layer for the blue-green deployment strategy (BA-3436). The core strategy FSM (BlueGreenStrategy) is in a stacked PR on top of this one.
Test Plan
🤖 Generated with Claude Code
📚 Documentation preview 📚: https://sorna--10426.org.readthedocs.build/en/10426/
📚 Documentation preview 📚: https://sorna-ko--10426.org.readthedocs.build/ko/10426/